home *** CD-ROM | disk | FTP | other *** search
- Xref: bloom-picayune.mit.edu comp.unix.sysv386:29504 comp.unix.bsd:9884 comp.os.mach:2806 news.answers:4460
- Path: bloom-picayune.mit.edu!enterpoop.mit.edu!spool.mu.edu!wupost!darwin.sura.net!jvnc.net!netnews.upenn.edu!dsinc!bagate!cbmvax!snark!esr
- From: esr@snark.thyrsus.com (Eric S. Raymond)
- Newsgroups: comp.unix.sysv386,comp.unix.bsd,comp.os.mach,news.answers
- Subject: Known Bugs in the USL UNIX distribution
- Message-ID: <1jjQl4#37x4q01WY07X85d4Xb1vMd0g=esr@snark.thyrsus.com>
- Date: 7 Dec 92 19:48:45 GMT
- Expires: 20 Feb 93 00:00:00 GMT
- Sender: esr@snark.thyrsus.com (Eric S. Raymond)
- Followup-To: comp.unix.sysv386
- Lines: 1158
- Approved: news-answers-request@MIT.Edu
-
- Archive-name: usl-bugs
- Last-update: Mon Dec 7 14:40:22 1992
- Version: 9.0
-
- What's new in this issue:
- * Fix availability for ndbm.
-
- (In the table below, bugs new this issue or old bugs for which information has
- been added are marked with a * at the left margin)
-
- 0. Table of Contents
- I. Introduction
- II. General Bugs
- 1. Dropout problems with tty devices
- 2. Suid programs dump core when signalled
- 3. DMAs on large ISA machines may fail
- 4. There is a cylinder limit on disk size
- 5. shmat(2) vs. vfork(2)
- 6. X performance problem
- 7. FIONREAD fails on regular files
- 8. A security hole in login
- 9. COFF problems with long filenames
- 10. Flakeouts in the Wangtek device driver
- 11. A kernel declaration bug
- 12. fread(3) does the wrong thing on pipes and FIFOs
- 13. Process accounting is broken
- 14. tar(1) foos up in the presence of symbolic links
- 15. Symbolic links can interfere with shellscript execution
- 16. Piping a csh builtin causes the shell to hang.
- 17. Quick port setup option in sysadm is broken
- 18. COFF binaries linked with curses(3) and shared libc hang
- 19. shl hangs, sxt devices bad
- 20. num-lock prevents mouse from working properly
- 21. adjtime() doesn't work
- 22. ttymon drops DTR
- 23. cron mail doesn't go through aliasing
- 24. fragility in xterm
- 25. csh lossage due to bad optimization
- 26. Bug in cp(1)
- 27. tbl -me doesn't work
- 28. who -r fragility leads to boot-time problems
- 29. at(1) breaks here-documents in shell scripts
- III. Networking and File-Sharing Bugs
- 1. NFS locking is unusably slow
- 2. UFS file system problems
- 3. Byte-order problem with NFS when accessing Sun disks
- 4. Under weird circumstances, lseek on UFS may cause corruption
- 5. FTP problems
- 6. A bug in the WD80x3 support
- IV. SCSI Support Problems
- 1. sar is confused by SCSI
- 2. A configuration problem
- 3. Synchronous SCSI hang problem
- 4. ps chokes on commands that do SCSI I/O
- 5. Transfer speed problems with Adaptec 1542B on 486s
- V. Development Tools Problems
- 1. General UCB library brokenness
- 2. USL emulation of BSD signals doesn't work
- 3. Possible string library problems
- * 4. USL's ndbm support is broken.
- 5. An include file is missing
- 6. sscanf(3) has a potential bug
- 7. Compiler problems
- VI. The FUBYTE Problem
- VII. Destiny and Dell
-
- I. Introduction
-
- This posting lists known bugs in System V Release 4 implementations, and known
- fixes applied by various porting houses (there's also random bits of
- information about SCO UNIX here and there). It was formerly part of the
- 386-buyers-faq issues 1.0 through 4.0, and is still best read in conjunction
- with the pc-unix/software FAQ descended from that posting.
-
- This document is maintained and periodically updated as a service to the net by
- Eric S. Raymond <esr@snark.thyrsus.com>, who began it for the very best
- self-interested reason that he was in the market and didn't believe in plonking
- down several grand without doing his homework first (no, I don't get paid for
- this, though I have had a bunch of free software and hardware dumped on me as a
- result of it!). Corrections, updates, and all pertinent information are
- welcomed at that address.
-
- This posting is periodically broadcast to the USENET group comp.unix.sysv386
- and to a list of vendor addresses. If you are a vendor representative, please
- check to make sure the information on your company is current and correct. If
- it is not, please email me a correction ASAP. If you are a knowledgeable user
- of any of these products, please send me a precis of your experiences for the
- improvement of future issues.
-
- The bug descriptions often include indications of fixes by the various porting
- houses to their current releases. These are:
-
- Consensys UNIX Version 1.3 abbreviated as "Cons" below
- Dell UNIX Issue 2.2 abbreviated as "Dell" below
- Esix Revision A abbreviated as "Esix" below
- Micro Station Technology SVr4 UNIX abbreviated as "MST" below
- Microport System V Release 4.0 version 4 abbreviated as "uPort" below
- UHC Version 3.6 abbreviated as "UHC" below
- SCO Open DeskTop 1.1 abbreviated as "SCO" below
-
- II. General Bugs
-
- 1. Dropout problems with tty devices
- The most serious problem anyone has reported is that the USL asy driver is
- flaky and occasionally drops characters at above 4800 baud.
- Microport, Dell, Esix, and UHC say that they believe they've fixed this.
- However, Dell, at least, was mistaken when they first made this claim; a more
- detailed description of the problem is given below. I have been assured that
- this is on the fix list for the next Dell release.
- Bela Lubkin at SCO comments "386 interrupt latency vs. unbuffered UARTs.
- This is a tough problem. Nobody's driver should drop characters with a
- turned-on 16550. It's not so easy with a 16450. Anyone with 16450s or lower
- should be able to solve their problems by dropping in a 16550."
-
- 2. Suid programs dump core when signalled
- Mark Snitily of SGCS says that under many SVr4s, signalling a
- process that is running suid root will cause it to core-dump. He says
- Dell and MST have fixed this, and SCO doesn't suffer from this.
-
- 3. DMAs on large ISA machines may fail
- On ISA machines with more that 16MB of RAM, SVr4 may try to do DMA
- from outside the bus's address space, causing serious problems. UNIX ought
- to do an in-memory copy to within the low 16MB but the USL base code doesn't.
- Dell says they've fixed this, and that's been confirmed by a user.
- UHC says they've fixed this; they add that the special buffer-allocation
- logic to handle the problem can be turned off with a tunable kernel parameter
- if you've got less than 16M.
- Microport says they've fixed this in their new 4.1 release, shipping early
- March.
- Esix offers a patch to correct this problem.
- SCO used to have a similar bug but fixed it long ago.
- John Sully <jms@mport.com> writes: "This was due to a bug in pre version 4
- dma code. The USL code has always at least attempted to do a copy from low
- memory to high memory on systems with more than 16Mb of RAM. By the way UHC is
- wrong; the buffer allocation code only comes into play if you have more than
- 16Mb of memory. You can turn it off if you have a machine (ie. an EISA bus)
- which will allow you to do DMA above 16Mb. You *must* have this tunable
- (MAXDMAPAGE) turned on if you are using *ISA* bus masters in a system with more
- than 16Mb of ram. Unfortunately doing this will affect all drivers which do
- dma as there is no good way to do this on a per-driver basis."
-
- 4. There is a cylinder limit on disk size
- Stock USL code is limited to 1,024 cylinders per Winchester, which
- might cause problems with some disk drives.
- Microport, Dell, Esix, MST, and UHC have fixed this.
- Bela Lubkin says "SCO's boot filesystem must lie below 1024 cylinder mark;
- anything else can be anywhere. This is more-or-less a limitation of the BIOS
- interface that the bootstrap loader must use. Could be circumvented by going
- directly to controller hardware in the bootstrap loader, but that would be
- horrendously complex with all the controllers & host adapters to be supported."
- This limit probably applies to all other UNIXes as well.
-
- 5. shmat(2) vs. vfork(2)
- The shmat(2) call is known to interact bady with vfork(2). Specifically,
- if you attach a shared-memory segment, vfork(), and then the child releases
- the segment, the parent loses it too! Workaround; use fork(2).
- UHC and Microport both suspect that they still have this bug and opine that
- anyone who uses vfork deserves to lose. Dell has no plans to fix it.
-
- John Sully <jms@mport.com> writes: "This is not a bug. It is completely
- consistent with the semantics of a change to the address space of the child.
- Think about it: any change to the address space of a child process created by
- vfork(2) is reflected in the parent since the child is actually executing in
- the parent's address space. Therefore if the child changes the address space
- (in this case by releasing the shared memory segment) what should happen?
- Right, the parent should have the same change happen. And what does happen?
- The segment is released in the parent. One can argue about the braindead
- semantics of vfork(2) all day, but the fact remains that this is exactly what
- one would expect to happen. To quote from the manual page:
-
- [...] vfork differs from fork in
- that the child borrows the parent's *memory* and thread of
- control until a call to execve or an exit (either by a call
- to exit or abnormally.) [ emphasis added ]
-
- and later:
-
- It does not work, however, to return while
- running in the child's context from the procedure which
- called vfork since the eventual return from vfork would then
- return to a no longer existent stack frame.
-
- Please note that the entire address space of the parent is used by
- the child created by vfork(2). The manual page also points out
- several other caveats involved in doing anything to the parent's
- address space except successfully calling an exec family function or
- _exit (note it specifically says *not* to call exit(2)). I do not believe
- that having a shared memory segment disappear from the parent's address
- space is out of line after reading the man page for vfork(2).
-
- It is interesting to note that Sun after implementing its new VM system in
- SunOS 4.0 initially had no plans to support vfork, since they felt that the COW
- semantics of the new fork would provide the necessary efficiency gain. Indeed
- they found that most programs which used vfork worked just fine by doing
- -Dvfork=fork. All that is, except for a certain popular command interpreter
- [ed: can you say C shell?]. So we are stuck with the legacy of this braindead
- system call.
-
- BTW, Microport has no plans to fix this :-)."
-
- 6. X performance problem
- Stock X11R4 and R5 (at least prior to 1.2E) is said to hog the
- processor if you use the LOCALCONNECT option. Jan Brittenson
- <bson@gnu.ai.mit.edu> posted the following workaround:
-
- I don't know what causes the standard X server to hog the CPU, but
- it can be avoided. Use the following program instead of xinit. Compile
- it with `$CC -O -o xserv xserv.c -lX11' where CC is either
- /usr/ccs/bin/cc or gcc. Set DISPLAY and XINITRC and run `xserv' from
- your home directory. This is just a q&d hack, and not really a
- substitute for xinit -- but it works.
-
- /* xserv.c -- start X server
-
- Start X server. Similar to xinit, but intended to
- circumvent the X386 CPU Hog Mode
-
- Jan Brittenson, June 2 1992 05:15 am */
-
- #include <stdio.h>
- #include <sys/types.h>
- #include <signal.h>
- #include <setjmp.h>
- #include <unistd.h>
- #include <libgen.h>
-
- #include <X11/Xlib.h>
- #include <X11/Xos.h>
- #include <X11/Xmu/SysUtil.h>
-
-
- extern int errno;
-
-
- /* Start X server. Fork-exec server, passing the DISPLAY environment
- variable. Wait for server to get up and running (at which point it
- passes back a SIGUSR1), at which point the user xinitrc file is run. */
-
- #define DEFAULT_XPATH "/usr/X386/bin/X386"
- #define XINITRC ".xinitrc"
- #define DEFAULT_XCOMMAND "xterm -g +1+1 -n login -display :0"
-
- extern void *malloc (), free ();
- extern char *basename (), *getenv (), *strcpy ();
-
- /* X stuff */
- Display *top_display;
-
-
- /* This is supposed to be in libgen.a... */
- static char
- *basename (s0)
- char *s0;
- {
- register char *s1;
-
- for (s1 = s0 + strlen (s0) - 1;
- s1 > s0 && *s1 != '/'; s1--);
-
- if (*s1 == '/')
- return s1+1;
-
- return s1;
- }
-
- jmp_buf sigusr1_frame;
-
- static void
- caught_sigusr1 (int dummy) { longjmp (sigusr1_frame, !0); }
-
-
- static char
- *dispname (s0)
- char *s0;
- {
- register char *s1;
-
- for (s1 = s0 + strlen (s0) - 1;
- s1 > s0 && *s1 != ':'; s1--);
-
- return s1;
- }
-
-
- /* No arguments */
- int
- main (argc, argv)
- int argc;
- char **argv;
- {
- char *xserver_file, *xinitrc_file, *home_path, *display, *display_X_arg;
- int xserver_pid, orgmask;
-
-
- /* Not that it really matters, just to avoid being used as a direct
- replacement for xinit. */
-
- if (argc != 1)
- {
- fprintf (stderr, "usage: %s\n", basename (*argv));
- exit (1);
- }
-
-
- /* Resolve xinitrc path. This is done before the server is
- started. */
-
- if (!(home_path = getenv ("HOME")))
- home_path = "/etc";
-
- if (!(xinitrc_file = getenv ("XINITRC")))
- {
- xinitrc_file = malloc (strlen (home_path) + 1 + strlen (XINITRC) + 1);
- sprintf (xinitrc_file, "%s/%s", home_path, xinitrc_file);
- }
- else
- xinitrc_file = strdup (xinitrc_file);
-
-
- /* Resolve display */
- if (!(display = getenv ("DISPLAY")))
- display = display_X_arg = ":0.0";
- else
- display_X_arg = dispname (display);
-
-
- /* Tell server to notify us when up and running */
- signal (SIGUSR1, SIG_IGN);
- orgmask = sigblock (sigmask (SIGUSR1));
-
- /* Start server */
- if (!(xserver_pid = vfork ()))
- {
- xserver_file = DEFAULT_XPATH;
-
- execl (xserver_file, xserver_file, display_X_arg, NULL);
-
- fprintf (stderr, "%s: can't exec %s (errno = %d) -- start-up aborted\n",
- basename (*argv), xserver_file, errno);
- exit (1);
- }
-
- if (xserver_pid < 0)
- {
- fprintf (stderr, "%s: can't fork (errno = %d) -- start-up aborted\n",
- basename (*argv), errno);
-
- exit (1);
- }
-
- /* Await signal from server */
- #if 0
- /* Why the #@$*! doesn't this work?! */
- sigsetmask (orgmask);
- alarm (20);
- sigpause (sigmask (SIGUSR1) | sigmask (SIGALRM));
- #else
- sleep (5);
- #endif
-
- /* Open display */
- if (!(top_display = XOpenDisplay (display)))
- {
- fprintf (stderr, "%s: unable to open display '%s' -- start-up aborted\n",
- basename (*argv), display);
- exit (1);
- }
-
- /* Execute xinitrc file */
- if (system (xinitrc_file) < 0)
- system (DEFAULT_XCOMMAND);
-
- /* Close display */
- XCloseDisplay (top_display);
-
- /* Terminate server */
- kill (xserver_pid, SIGTERM);
-
- /* Finished */
- free (xinitrc_file);
- }
-
- 7. FIONREAD fails on regular files
- Christoph Badura <bad@generics.ka.sub.org> reports that the FIONREAD ioctl()
- fails on regular (disk) files. He has sent USL a one-line kernel fix.
-
- 8. A security hole in login
- David Wexelblat <dwex@mtgzfs3.att.com> reports: "There is a HUGE security
- hole in /bin/login in all USL derived SVR4s before 4.0.4. Refer to CERT
- advisory CA-91:08, dated 5/23/91. This is known to be present in AT&T SVR4
- 2.1, and Microport SVR4 3.1. ESIX claims to have fixed it, Microport reports
- that it is fixed in 4.1. I won't give any more details unless necessary.
- Suffice to say that this bug allows any non-privileged user on an SVR4 system
- to get read-write access to any file on the system."
-
- 9. COFF problems with long filenames
- A source at Dell urges: "Our SVR4v2 did some stuff that USL didn't get
- around to until SVR4v4. Try Dell UNIX 2.1 with a COFF program on a large UFS
- filesystem in a directory with long names. Runs on Dell UNIX. Breaks on
- others." I don't have more definite info yet.
-
- 10. Flakeouts in the Wangtek device driver
- Dell reports that USL's Wangtek device driver is seriously flaky. "How'd
- you like a multi volume backup where the second and subsequent volumes don't
- follow on from the previous volumes?" UHC confirms this and is actively
- working on the problem.
- An anonymous SCOer says "The QIC02 tape controller `standard' is seriously
- flaky. Our driver's in pretty good shape but nobody will ever have a truly
- solid driver that supports every QIC02 controller you can find."
- Gordon Ross <gwr@mc.com> reports: "Actually, the SCSI tape target driver
- `st01' has a similar problem at version 4.0.3 which I corrected while I worked
- on the SVR4 code. The correction was provided to the support group at USL.
- The actual problem was that the SCSI tape would return a `check status'
- completion code which was just trying to inform the driver of the arrival
- of the `logical end of media' indication but the driver was treating it
- as an error. The tape drive had in fact written the data, but the driver
- incorrectly assumed that the "check status" return meant that it failed.
- The result of this is that when you write into the end of the tape, you
- can read back one more "chunk" than yu wrote. Of course, cpio does not
- like this at all when doing multi-volume backups..."
-
- 11. A kernel declaration bug
- A botch in USL's /etc/conf/pack.d/kernel/space.c (which is present in
- Consensys 1.3, Dell 2.1, Esix 4.0.3A, Microport 4.0.3 and 4.0.4 and may also be
- present in other SVr4s) can step on the linesw[] table. The problem is that
- the domain name array initialization is wrong and too short; thus, when it's
- set, data past the end of the array can be stomped. To fix this, find the
- following near line 247:
-
- char srpc_domain[] = SRPC_DOMAIN;
-
- and change it to
-
- char srpc_domain[SYS_NMLN] = SRPC_DOMAIN;
-
- then rebuild the kernel.
- Microport officially knows about this bug and plans to fix it in a
- near-future update release. It has been fixed in Dell 2.2.
-
- 12. fread(3) does the wrong thing on pipes and FIFOs
- Ed Hall <edhall@rand.org> writes: "Unlike the raw read() system call,
- fread() is supposed to be able to make several partial reads to satisfy the
- data requested by its arguments. The exceptions are an EOF or an error on the
- stream. This characteristic is quite useful when moving data through pipes or
- over network connections, since partial reads are quite common in these cases.
- Well, the version of fread() in ESIX 4.0.3 (and likely other Sys5R4's) only
- does a single physical read, and if it only satifies part of the requested
- number of bytes, that's all you get. This can sting you even if you carefully
- check the value returned by fread(), since the value returned is rounded down
- to the number of complete "nitems" read, although your position in the stream
- can be up to size-1 bytes beyond that point. Neither ferror() nor feof()
- indicate anything is wrong when this happens."
- This bug (which is also present in 4.0.4) is serious and nasty and should
- be high on every porting house's list to fix. It appears to be peculiar to
- USL 4.0.3 and 4.0.4; 4.0.2 does *not* have it, nor does SCO.
- A USL source claims it has been fixed in 4.1.
-
- 13. Process accounting is broken
- In 4.0.3, process accounting doesn't work. From examining the accounting
- scripts, it appears that /usr/lib/acct/accton is supposed to set a return code
- depending on whether accounting was switched on already or not. However, it
- always returns the same result - accounting switched off. This means that the
- /usr/lib/acct/ckpacct script, which is run every hour to keep the proccess
- accounting log in check, instead turns off accounting the first time it is run
- after booting. The same happens with the nightly /usr/lib/acct/monacct
- program.
- I don't yet know whether this bug is present in 4.0.4. It is definitely
- un-fixed in Dell 2.1 and Consensys 1.3. In Dell 2.2 the return bug is fixed,
- but accounting isn't automatically enabled at boot time.
-
- 14. tar(1) foos up in the presence of symbolic links
- Tar can get the names of symbolic links wrong when creating an archive.
- This bug can be demonstrated by doing the following:
-
- mkdir t
- cd t
- touch a 1234567890
- ln -s 1234567890 b
- ln -s a c
- tar vcf ../t.tar .
-
- The output generated by tar is:
-
- a ./ 0 tape blocks
- a ./a 0 tape blocks
- a ./1234567890 0 tape blocks
- a ./b symbolic link to 1234567890
- a ./c symbolic link to a234567890
-
- (Note the above commands should be done in the order shown and in a new
- directory) This bug is nasty. Recommended solution: use GNU tar.
- This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
- other SVr4s as well.
-
- 15. Symbolic links can interfere with shellscript execution
- There is a problem running #! scripts when symbolic links are involved.
- Typing in the following from a command shell demonstrates the problem:
-
- mkdir a b
- ln -s a c
- cd a
- cat > script <<!
- #!/bin/sh
- echo Hello
- !
- chmod 755 script
- cd ../b
- ln -s ../c/script .
- ./script
-
- The message generated from the last line is:
-
- a/script: a/script: cannot open
-
- This is reported from Esix 4.0.3, Consensys 1.3, and Dell 2.2, but
- probably exists on other SVr4s as well.
-
- 16. Piping a csh builtin causes the shell to hang.
- While running csh, this can be demonstrated by some of the following:
-
- echo Hello | cat
- history | more
-
- (A solution to this one is use tcsh-6.02.)
- This is reported from Esix 4.0.3 and Consensys 1.3, but probably exists on
- other SVr4s as well. It is reported fixed in Dell 2.2.
-
- 17. Quick port setup option in sysadm is broken
- In 4.0.3 sysadm, the quick port setup option, which is used to add and
- delete terminal ports, is seriously broken. The script modifies /etc/conf/*
- files, and has incorrect minor numbers, sets the 5th field of sdevice.d if Y
- when it should be N, and is missing columns for node.d. See
- /usr/sadm/sysadm/bin/q-add.
-
- 18. COFF binaries linked with curses(3) and shared libc hang
- ...eating the CPU. Cause unknown.
-
- 19. shl hangs, sxt devices bad
- shl(1) does not work. Try creating a layer and doing an 'ls'. Your session
- hangs. Bruce Momjian <root%candle.uucp@bts.com>, who reported this bug, says
- he believes it is the sxt devices which are broken. It definitely exists in
- Consensys 1.3.
-
- 20. num-lock prevents mouse from working properly
- When using the Motif window manager, if your num lock is on, your mouse
- clicks are not recognized by the window manager. The mouse still works in
- xterm(1). This is allegedly fixed in Destiny (4.2).
-
- 21. adjtime() doesn't work
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 adjtime() doesn't.
- Calling `date -a' works to adjust the time slowly.
-
- 22. ttymon drops DTR
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 the ttymon(1)
- utility for HDB uucp drops DTR every few weeks. The workaround is to disable
- and re-enable it.
-
- 23. cron mail doesn't go through aliasing
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6 cron mail to adm
- doesn't get redirected by the aliases file.
-
- 24. fragility in xterm
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, doing ~! from
- a cu in xterm kills xterm. This has been fixed in Dell 2.2.
-
- 25. csh lossage due to bad optimization
- If a csh user sources a non-existent file in their .cshrc (eg, source .alias,
- where .alias doesn't exist), then the system will hang for a couple of minutes.
- Eventually the user get an "Out of memory" error and the console logs "NOTICE:
- out of swap space - Insufficient memory to allocate 2 pages - system call
- failed".
- This appears to be due to over-optimization of code surrounding a longjmp
- call.
- (There are numerous other reports of memory leak bugs in csh).
-
- 26. Bug in cp(1)
- If ``copy'' encounters a directory before a file, it dumps core ...
-
- --- cut ---
- cd /tmp
- mkdir copybug jnk
- cd jnk
- mkdir directory
- >file
- cp -r * /tmp/copbug
- --- cut ---
-
- This was reported from Consensys 4.0.3 but is probably a generic SVr4 bug.
-
- 27. tbl -me doesn't work
- Wolfgang Denk reports that trying to use "tbl -me" for any input file causes
- tbl to quit. The problem is that newer tbl versions don't accept [nt]roff
- contol lines (".rm @W") after .TS.
-
- 28. who -r fragility leads to boot-time problems
- It coredumps if the name of the timezone is longer than three characters.
- This can be a real problem for European sites... and is potentially more
- hazardous than immediately apparent as _a lot_ of the initialization scripts
- (rc1.d, rc2.d) use ``who -r'' to see if the machine is in single- or multi-user
- mode. And when ``who'' bombs out, the ``set'' command is iven an empty
- command-line and can't do much else than print the shell variables, $1-$9
- remain empty ... meaning that more or less all the scripts fail in various ways
- and the system has an exceptionally hard time coming up.
-
- 29. at(1) breaks here-documents in shell scripts
- at adds gratuitous empty lines to the job submitted by the user.
- This prevents shell here-documents from working.
-
- III. Networking and File Sharing Bugs
-
- 1. NFS locking is unusably slow
- Randy Terbush <randy@dsndata.dsndata.com> has posted code which
- demonstrates a serious bug in the SVr4 NFS locking daemon.
- In his own words: "The symptoms are ~30% cpu usage by 'lockd' and
- severe slowing of the machines on the network. This program
- demonstates that it takes ~20 seconds to obtain locks from an ailing
- 'lockd'. We have verified that this bug does not exist in HPUX 8.0x."
- Randy's code is too large to be included here. He is, quite
- rightly, exercised at USL's exceedingly slow response to this problem.
- The comment in his makefile reads, in part:
-
- # USL has admitted to the existance of this bug in version 4.0, 4.1,
- # and 4.2 of their distributed and yet to be released sources. This is
- # a network crippling problem that they have refused to fix until
- # release 4.3, which will be OVER 1 YEAR from today. (29 Oct 1992)
- # If your version of 'lockd' exhibits this same problem, I would
- # strongly urge you to contact your vendor and ask them to put some
- # pressure on USL to fix this problem. SVR4 is virtually useless in a
- # network of shared resources while this problem exists.
-
- 2. UFS file system problems
- In stock USL 4.0.3, you can't use a UFS file system as the root; the system
- hangs if you try. Consensys, Dell, Esix, Microport, MST, UHC, and ESIX all
- appear to have fixed this.
- David Aitken, the UNIX product manager at UHC, writes "The ufs as root file
- system [problem] was not really a bug, just a little oversight on USL's part -
- we have fixed it completely by adding one line to the /stand/boot script:
- rootfstype=ufs!" He adds that they've been using ufs on their lab machines for
- over 10 months with no trouble, and the latest UHC release defaults to ufs if
- you have more than 120MB of disk.
-
- 3. Byte-order problem with NFS when accessing Sun disks
- Christoph Badura <bad@generics.ka.sub.org> notes that the stock USL resolver
- library suffers from serious confusion about the byte order in the
- socketaddr_in structure. This bug is acknowledged by USL for the 4.0.4
- release. A symptom of this bug is that Sun disks will not mount correctly over
- NFS. As a workaround, try removing the references to /usr/lib/resolv.so from
- /etc/netconfig and rebooting your system. Unfortunately, this will mean
- you can't use nameservers.
- Alan Batie <batie@agora.rain.com> writes: "Actually, you don't have to
- remove resolv.so, just put tcpip.so first and have a hosts file with the names
- of hosts you want to do NFS mounts from. This way you can use nameservers for
- most things."
-
- 4. Under weird circumstances, lseek on UFS may cause corruption
- Christoph Badura <bad@generics.ka.sub.org> reports that a UFS lseek() to an
- offset which is a multiple of 4096 but not a multiple of 8192, followed by a
- write(), may corrupt the file being written. The bug shows up only, if the
- file has no pages in the page pool associated with it at the seek offset and at
- 4k before the seek offset. He has sent USL kernel fix for this, which was
- included in 4.0.4.
-
- 5. FTP problems
- The in.ftpd on SVR4.0.3 does not support all the commands listed in RFC 959.
- When recent SCO UNIX/ODT versions ftp to SVR4.0.3, the SVR4 side will refuse,
- drop the connection, and core dump after you authenticate. This is because the
- SCO end sends the 'SYST' command ala RFC 959, and the SVR4.0.3 end doesn't
- recognise it. Some ports have fixed this.
- Christoph Badura adds: "The bug is do to a longjmp(3) on a sigjmpbuf obtained
- by sigsetjmp(3). ARGH. Testing led to a bug in the original BSD sources, which
- is still present in the NET/2 ftpd. "
-
- 6. A bug in the WD80x3 support
- MST reports a serious bug in the SVr4 kernel support for this card. Here's
- how to reproduce it:
-
- server: init 3 and share (export) /usr for example.
-
- client: mount -F nfs server:/usr /mnt
- cd /mnt
- find . -print | cpio -ocBuv > /dev/null
-
- what happens:
- server and client will "hang" together.
-
- "cue":
- hit keys on server and/or client, hang will go away
- for 10-20 seconds temporarily. Yank BNC connectors
- do the same trick.
-
- They say they've heard from customers that this happens on Dell, UHC as well
- as USL 4.0.4. PCNFS/BWNFS network xcopy suffers this as well. Client can be a
- Sun Sparc for that matter.
-
- IV. SCSI Support Problems
-
- 1. sar is confused by SCSI
- Sar -d doesn't work on SCSI drives. Dell fixed this in 2.1 and it's
- reported to work OK in Esix 4.0.3A; no report of any other SVr4 having fixed
- this yet. SCO fixed it in 3.2.4.
-
- 2. A configuration problem
- Stock USL requires you to jumper your SCSI devices to fixed IDs
- during installation (it can be changed to any other ID after).
- Dell says they've fixed this. The requirement is definitely still present
- in Esix and Consensys 1.3. UHC thinks they've fixed this, but their 4.0.3.6
- release still seems to demand ID 1 to install.
-
- 3. Synchronous SCSI hang problem
- David Wexelblat <dwex@mtgzfs3.att.com> reports: "Stock SVR4.0.3 will hang
- the SCSI bus with a 1542 in synchronous mode. Dell fixed this, and this has
- been given to Microport [ed note: Microport 4.0.4 and Consensys 4.0.3 have
- fixed the problem; MST UNIX and Esix 4.0.3 still have this problem; I have not
- yet been able to determine if ESIX 4.0.4 does]. In the file /sbin/bcheckrc,
- change the line:
-
- echo MARK > /dev/rswap
-
- to
-
- echo MARK | dd of=/dev/rswap bs=512 conv=sync > /dev/null 2>&1
-
- The magic is apparently the conv=sync, which forces a 512 byte block
- to be written. The original echo writes 4 bytes, which apparently causes
- synchronous SCSI to go out to lunch.
-
- Now, you ask, how can I fix this, since the system won't boot? There are
- a couple of methods. First, if possible, disable synchronous negotiation
- (1542 jumper J5-1 removed, plus whatever you may need to do to your drive).
- Then boot up, edit /sbin/bcheckrc, then shutdown, restrap for synchronous,
- then reboot. Everything should be OK.
-
- That's the easy way. Unfortunately, some hard drives will only work
- in synchronous mode. Well, you can still recover from this phenomenon.
- Here's how:
-
- 1) Install on your hard drive
- 2) Boot from the first boot floppy. When it tells you to, insert
- the second boot floppy. At the first prompt, hit <DEL> to
- break out to a shell.
- 3) Mount your hard drive under /mnt with the following command
- (replace FS-TYPE with s5, s52, or ufs, whichever you used for
- for your root partition):
-
- /etc/fs/FS-TYPE/mount /dev/dsk/c0t0d0s1 /mnt
-
- 4) Now edit /mnt/sbin/bcheckrc:
-
- ed /mnt/sbin/bcheckrc
-
- You may want the 'ed' man page handy (I barely remember how to
- to use 'ed' :->). For simplicity, you can delete/comment out
- the offending line, then replace it with the correct line later.
- 5) Unmount the hard drive:
-
- umount /mnt
-
- 6) Reboot from the hard drive. Everything should come up OK. and
- you can finish editing /sbin/bcheckrc, if necessary.
-
- Note that you perform these actions at your own risk. The first version was
- performed by me on Microport SVR4, and the second was performed by someone
- else (on my suggestion) on ESIX SVR4."
- This problem appears to be fixed on Consensys 1.3 and Dell 2.1.
-
- 4. ps chokes on commands that do SCSI I/O
- Hugh Stearns <hoyt@isus.tnet.com> reports that in 4.0.3.6, ps
- doesn't work when a SCSI command in progress. It stops printing at the
- process executing the scsi command.
- This is still broken in Dell 2.2.
-
- 5. Transfer speed problems with Adaptec 1542B on 486s
- If a system mount or install fails, try setting the DMA speed to 5MB/s,
- rather than the default 5.7MB/s. This is accomplished by removing the jumper
- shorting the 12th pin pair of jumper block 5.
-
- V. Development Tools Problems
-
- 1. General UCB library brokenness
- The BSD compatibility libraries were badly broken in USL code. A Dell
- source adds "That meant that almost all the apps derived from them were broken
- too. Most stuff like automount will die when you send a SIGHUP, instead of
- rereading the map file. You can get a system into very strange states when
- that happens."
- John Sully <jms@mport> of Microport opines: "This is a bug in automount
- itself rather than BSD compatibility, since the automount which comes with SVR4
- is not compiled with the BSD libraries. (isn't this comforting?? :-()."
-
- Esix and UHC's BSD libraries are USL stock. I don't yet know
- the status of other ports. Microport has run into things they think may be
- symptoms of this but have no fix yet.
-
- John Sully <jms@mport> of Microport counters with: "One common thread I find
- on reading of these problems is that the BSD compatibility libraries are
- *misused*. [...] The problem is that BSD and SYSV have similarly named .h files
- which sometimes contain different definitions for objects with the same name.
- This has been known to cause all sorts of problems because the SYSV headers are
- picked up and then the calls are satisfied from the BSD library rather than the
- shared object library. I have found that if you use /usr/ucb/cc that the BSD
- compatibility is much less broken than it would seem at first because it
- ensures that the correct headers are picked up."
-
- However, note that there is at least one *real* bug known --- as of 4.0.4
- the signal emulation cannot explicitly set a handler to SIG_DFL or SIG_IGN.
-
- Ron Guilmette <rfg@ncd.com> writes "[Library lossage] may be easily
- demonstrated by attempting to build and link the GNU C compiler with
- `-L/usr/ucblib -lucb'. The resulting compiler will most certainly
- crash and die." John Sully thinks this is because the /usr/ucb/cc
- compiler should have been used, but wasn't.
-
- 2. USL emulation of BSD signals doesn't work
- A different source reports that the the USL implementatation of BSD signals
- is broken in both 4.0.3 and 4.0.4; in particular, the sigvec() family doesn't
- work properly. It is possible to make minor tweaks to source to make such apps
- work properly with the native USL signals implementation.
-
- Here's more on the signals problem, thanks to Richard <rc@siesoft.co.uk>:
- ------------------------------------------------------------------------------
- The problem is to do with the signal() function that is within the BSD
- compatability libc.
-
- To reproduce the problem do the following:
-
- #include <stdio.h>
- #include <sys/types.h>
- #include <signal.h>
- #include <sys/siginfo.h>
-
- main()
- {
- signal(SIGPIPE,SIG_IGN);
- pause();
- }
-
- and compile it with cc xx.c -o xx /usr/ucblib/libucb.a
-
- (John Sully observes that this is definitely wrong; /usr/ucb/cc should have
- been used rather than "cc ... -L/usr/ucblib -lucb" or the equivalent "cc ...
- /usr/ucblib/libucb.a".)
-
- If you run the program and then signal it with a SIGPIPE, the program
- will die, even though you've told it to ignore SIGPIPE.
-
- The fix is difficult unless you've got source because there's a missing 'else'
- clause from the signal() code. This is the only signal fault I've found in
- the BSD signal functions, details of the rumoured sigvec problem would be
- useful?
-
- If you're trying to compile an application you could change the application
- code to do the following, this does work..
-
- void
- catch(s)
- int s;
- {
- /* DO NOTHING */
- ;
- }
-
- main()
- {
- signal(SIGPIPE,catch);
- pause();
- }
-
- SUMMARY
- You can only change a signal handler to a function handler, any number of
- times. Any attempt to set the handler to SIG_DFL, or SIG_IGN will fail.
-
- This bug has given some people working with X11R5 aggro, causing the X server
- to die when you close a client.
-
- Christoph Badura <bad@flatlin.ka.sub.org> confirms this bug
- He has sent USL a source fix. It appears already to have been fixed in Dell
- 2.2.
- ------------------------------------------------------------------------------
-
- 3. Possible string library problems
- There are also persistent rumors of problems in the BSD-emulation string
- libraries. I have not been able to pin down specifics on this.
-
- 4. USL's ndbm support is broken.
- Christoph Badura <bad@generics.ka.sub.org> reports "The ndbm functions in
- the ucb library are broken [apparently due to a compiler of optimizer bug in cc
- -- ed.]. Try makeing the whatis data base for /usr/share/man with Tom
- Christiansen's perl rewrite of man.
- The easiest way to fix this is to compile GNU's replacement ndbm.c with gcc
- -fpcc-struct-return -traditional (gcc1.40 or 2.2 will do nicely) and install it
- in your C library. Source is available for FTP from prep.ai.mit.edu.
-
- 5. An include file is missing
- Both 4.0.3 and 4.0.4 USL versions are missing the documented dial.h
- file from their /usr/include directory. Dell 2.1 has it.
-
- 6. sscanf(3) has a potential bug
- Anthony Shipman <als@bohra.cpg.oz.au> reports: " I found the following bug
- in SCO Unix 3.2.* and I think it may be common to many AT&T derived Unixes.
-
- sscanf() calls _doscan() to read from a pretend file. The file
- uses the string as a buffer and a fake file descriptor of 60 (=_NFILE).
- Since _NFILE (for SCO UNIX) is 60 it assumes that fd 60 can never be open.
-
- Then when fscanf() hits the end of the string it calls _filbuf() to read
- into the buffer (which is the string) from fd 60. This should fail with
- an errno=9 and then _filbuf() sets EOF and it all terminates.
-
- However in SCO Unix you can reconfigure the kernel to increase the number
- of files per process to a recommended maximum of 150. If you do this then
- your program might have fd 60 open one day. Then sscanf() will read from this
- file overwriting your string. The byte count to the read() in _filbuf()
- is some undefined but large value so a lot of memory will be overwritten. In
- my case the string was on the stack so my stack was wiped.
-
- In short if you configure your kernel to have NOFILES > _NFILE ie more than
- the default then sscanf() is a time bomb in your code."
-
- 7. Compiler problems
- Ronald Guilmette <rfg@ncd.com> also reports the following:
-
- ------------------------------------------------------------------------------
- /* Here is a bug in the original SVR4 C compiler (aka C Issue 5) which
- effectively prevents you from making good use of the `const' and
- `volatile' qualifiers defined by ANSI C in conjunction with pointer
- types and typedef statements. Compile this code and you will get:
-
- "qualifiers.c", line 23: left operand must be modifiable lvalue: op "="
-
- ...if your copy of the svr4 C compiler still has the bug. Note that
- given these declarations, the ANSI C standard say that the thing pointed
- to by the variable `pci' should be considered to be constant... not the
- variable `pci' itself. (The GCC compiler, either version 1.x or version
- 2.x, correctly compiles this example without complaint.)
- */
-
- typedef const int *ptr_to_const_int;
-
- ptr_to_const_int pci;
-
- int i;
-
- void main ()
- {
- pci = &i;
- }
- ------------------------------------------------------------------------------
- /* Here is a subtle bug in the original SVR4 C compiler (aka C Issue 5)
- which prevents you from first declaring a tagged type (i.e. a struct
- type or a union type) in a parameter list, and then defining that tagged
- type later on within the same scope. (Note that according to the ANSI C
- standard, the scope in which parameters get declared and the outermost
- block of a function body are one and the same scope. Thus, this really
- is legal ANSI C code!)
-
- Try compiling this with your C compiler on SVR4. If your compiler still
- has the bug, you will get:
-
- "tagged_type.c", line 24: warning: dubious tag declaration: struct S
- "tagged_type.c", line 28: warning: improper member use: i
- "tagged_type.c", line 28: warning: improper member use: i
- "tagged_type.c", line 31: warning: dubious tag declaration: struct S
- "tagged_type.c", line 35: warning: improper member use: i
- "tagged_type.c", line 35: warning: improper member use: i
-
- (The GCC compiler also had this bug in version 1.x, but it has been fixed
- in version 2.x.)
- */
-
- void foobar1 (arg) /* use old-style without prototypes */
- struct S *arg;
- {
- struct S { int i; }; /* define the type `struct S' */
-
- arg->i = arg->i; /* legal according to ANSI C rules! */
- }
-
- void foobar2 (struct S *arg) /* use new-style with prototypes */
- {
- struct S { int i; }; /* define the type `struct S' */
-
- arg->i = arg->i; /* legal according to ANSI C rules! */
- }
- ------------------------------------------------------------------------------
- /* Here is a serious bug in the original SVR4 `dump' program which dumps
- out parts of object files in either plain hex form or symbolically.
-
- To see the `dump' program get a segfault and die, save this code under
- the name `dump-bug.c' and then do:
-
- cc -g -c dump-bug.c
- dump -v -D dump-bug.o
-
- The bug arises whenever `dump' tries to read Dwarf debugging information
- for an array of pointers to any "user defined" type (e.g. `struct S' in
- this example). Past that point, `dump' is totally confused, so further
- Dwarf debugging information finally causes it to go belly-up.
- */
-
- struct S { int i; };
- struct S *array[10];
- int j;
- ------------------------------------------------------------------------------
- It appears that the svr4 C compiler (for x86 machines) doesn't conform real
- well to either the letter or the spirit of the IEEE 754 floating-point
- standard. In particular, "unordered comparisons" and other operations on
- NaNs don't always produce the result that that the IEEE 754 standard calls
- for.
-
- An AT&T source comments: "This is documented in the SVID as a future direction.
- We do not support NaNs in -Xa and -Xt modes, only in -Xc. Try
- isnan(sqrt(-1.0)) to determine which modes support it."
- ------------------------------------------------------------------------------
-
- The compiler fails to issue diagnostics for cases where a floating point
- literal is given which exceeds the range of its type (either float or
- double). Actually this one could be argued either way, since IEEE FP
- format includes "infinities" and the compiler probably just changes any
- FP value which is out of range for its type into either positive infinity
- or negative infinity (as appropriate).
-
- The compiler fails to issue diagnostics in cases where a typedef name is
- reused to declare a formal parameter, as in:
-
- -----------------------------------------------------------------------
- typedef int FOO;
- void bar (FOO)
- int FOO;
- {
- }
- -----------------------------------------------------------------------
-
- The compiler crashes on the following invalid input:
-
- -----------------------------------------------------------------------
- int i;
- volatile void *pvv;
-
- void pvv_test ()
- {
- (i ? *pvv : *pvv); /* ERROR */
- }
- -----------------------------------------------------------------------
-
- The compiler fails to issue diagnostics for cases where an attempt is
- made to "forward declare" an enum type (without also defining it), as
- in:
-
- -----------------------------------------------------------------------
- enum enum0 *ep; /* ERROR */
- -----------------------------------------------------------------------
-
- The compiler rejects the following code with an error, although there
- seems to be no good reason why it should (because no object is being
- declared).
-
- -----------------------------------------------------------------------
- #include <limits.h>
-
- typedef char array_type[ULONG_MAX];
- -----------------------------------------------------------------------
-
- VI. The FUBYTE Problem
-
- (Thanks to Christoph Badura <bad@flatlin.ka.sub.org> for this info)
-
- The kernel function fubyte() is documented to return a positive value when
- given a valid user space address and -1 otherwise. In the latter case u.u_error
- is set to EFAULT. USL SysV R4.0.3 has a sign extension bug in the
- implementation of fubyte() for local file descriptors (i.e. not opened via
- RFS), which causes fubyte() to return negative values if the byte fetched has
- its high bit set. This bug doesn't affect STREAMS drivers, as they don't call
- (and in fact are normally unable to call) fubyte(). Thus writing a byte with
- the high bit set to certain character device drivers returns with -1 and errno
- set to EFAULT.
-
- The bug may affect any character device driver that calls fubyte(). It's not
- limited to serial card drivers. The bug is noticed most often with serial card
- drivers, since uucp uses byte values > 127 very early during g-protocol setup
- and drivers for serial cards tend to use fubyte() quite often.
-
- Note also that the bug's effect is different if the driver checks for a -1
- return value of fubyte() or just a negative one. In the former case it is
- possible to pass bytes with the 8 bit set through fubyte(), except for 0xff
- which is -1 in two's complement. That makes the bug more obscure.
-
- The fix is easy. First, make a backup copy of the kernel object file
- /etc/conf/pack.d/kernel/vm.o! A disassembly of vm.o(lfubyte) should reveal
- *exactly* one mov[s]bl (move byte to long w/sign extend). That one needs to be
- patched into a movzbl (zero extend). The difference is one bit in the second
- byte of the opcode.
-
- The movsbl has the bit pattern 00001111 1011111w mod/rm-byte.
- The movzbl has the bit pattern 00001111 1011011w mod/rm-byte.
-
- The 'w' bit is 0 for the instruction in question. So the opcodes are 0f be and
- 0f b6. Here is the diff -c from dis -F lfubyte showing the patch applied to
- the Dell 2.1 kernel:
-
- *** vm.o Mon Mar 9 00:31:38 1992
- --- vm.o.org Mon Mar 9 00:32:40 1992
- ***************
- *** 22,28 ****
- 11c90: 85 c0 testl %eax,%eax
- 11c92: 75 09 jne 0x9 <11c9d>
- 11c94: 8b 45 08 movl 8(%ebp),%eax
- ! 11c97: 0f b6 00 movzbl (%eax),%eax
- 11c9a: 89 45 fc movl %eax,-4(%ebp)
- 11c9d: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
- 11ca7: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
- --- 22,28 ----
- 11c90: 85 c0 testl %eax,%eax
- 11c92: 75 09 jne 0x9 <11c9d>
- 11c94: 8b 45 08 movl 8(%ebp),%eax
- ! 11c97: 0f be 00 movsbl (%eax),%eax
- 11c9a: 89 45 fc movl %eax,-4(%ebp)
- 11c9d: c7 05 d8 13 00 00 00 00 00 00 movl $0x0,0x13d8
- 11ca7: 83 3d dc 13 00 00 00 cmpl $0x0,0x13dc
-
- Of course there is a workaround at the driver level. Canonically, one would do
- this by checking for fubyte() returning -1 *and* u.u_error being set to EFAULT
- (u.u_error is cleared upon entering a system call). However, in R4.0.3
- fubyte() does NOT set u.u_error. It *does* set u.u_fault_catch.fc_errno.
-
- Cristoph reports that Dell V.4 can be object-patched successfully to fix this.
- I'm told that the offending 11c97 is at exactly the same address in the
- Consensys 1.3 kernel. I do not know the status of the other ports.
-
- Another poster (Marc Boucher <marc@cam.org>) adds:
-
- On ESIX SVR4.0.3 Rev. A, the instruction movsbl in question can be changed to
- movzbl (as described above) with a binary-editor on file
- /etc/conf/pack.d/kernel/vm.o. At offset 0x11eb0, change 0xbe to 0xb6.
-
- Before patching, verify that your /etc/conf/pack.d/kernel/vm.o is the same as
- mine! On my system, the /bin/sum generated checksum of vm.o was "4440 222".
-
- The problem results from a sign-extension bug. The function lfubyte(), which
- is called by fubyte(), is declared as
-
- int lfubyte(char *addr); /* actually caddr_t */
-
- The byte is fetched with
-
- val = *addr;
-
- which triggers sign extension. Casting addr to a unsigned char * or declaring
- it as such solves the problem.
-
- This bug is still present in stock USL 4.0.4. However, it has been fixed in
- Dell 2.2.
-
- VI. Destiny and Dell
-
- A source at at UNIX System Labs Europe claims that `Destiny' (the new Release
- 4.2) incorporates all of Dell UNIX's fixes to 4.0.3; thus, any bug for which a
- Dell fix is indicated above should be gone in Destiny.
- --
- Send your feedback to: Eric Raymond = esr@snark.thyrsus.com
-